To perform online iNMF, we need to install the online branch. Please see the instruction below.
library(devtools)
install_github("MacoskoLab/liger", ref = "online")
We first create a liger object by passing the filenames of HDF5 files containing the raw count data. The data can be downloaded here.
library(liger)
pbmcs = createLiger(list(stim="stim_PBMCs.h5",ctrl="ctrl_PBMCs.h5"))
We then perform the normalization, gene selection, and gene scaling in an online fashion, reading the data from disk in small batches.
pbmcs = normalize(pbmcs)
pbmcs = selectGenes(pbmcs,var.thresh = 0.2,do.plot = F)
pbmcs = scaleNotCenter(pbmcs)
Now we can use online iNMF to factorize the data, again using only minibatches that we read from the HDF5 files on demand.
pbmcs = online_iNMF(pbmcs, k=20, max.epochs = 5)
After performing the factorization, we can perform quantile normalization to align the datasets.
pbmcs = quantile_norm(pbmcs)
We can also visualize the cell factor loadings in two dimensions using t-SNE or UMAP.
pbmcs = runUMAP(pbmcs)
plotByDatasetAndCluster(pbmcs, axis.labels = c("UMAP1","UMAP2"))
We can also perform online iNMF with continually arriving datasets.
MOp = createLiger(list(cells="allen_smarter_cells.h5"))
MOp = normalize(MOp)
MOp = selectGenes(MOp, var.thresh = 2)
MOp.vargenes = MOp@var.genes
MOp = scaleNotCenter(MOp)
MOp = online_iNMF(MOp, k=40, max.epochs = 1)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))
MOp2 = createLiger(list(nuclei="allen_smarter_nuclei.h5"))
MOp2 = normalize(MOp2)
MOp2@var.genes = MOp@var.genes
MOp2 = scaleNotCenter(MOp2)
MOp = online_iNMF(MOp, X_new = list(nuclei = "allen_smarter_nuclei.h5"), k = 40, max.epochs = 1)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))
MOp = createLiger(list(cells="allen_smarter_cells.h5"))
MOp@var.genes = MOp.vargenes
MOp = online_iNMF(MOp, k = 40, max.epochs = 1)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))
MOp = online_iNMF(MOp, X_new = list(nuclei = "allen_smarter_nuclei.h5"), k = 40, project = TRUE)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))